Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry artifact rename if it fails #4001

Merged
merged 2 commits into from
Aug 31, 2024

Conversation

nhz2
Copy link
Contributor

@nhz2 nhz2 commented Aug 20, 2024

Hopefully fixes #3822

This uses a strategy from https://github.com/isaacs/node-graceful-fs/blob/234379906b7d2f4c9cfeb412d2516f42b0fb4953/polyfills.js#L87-L95 to retry the rename for about a minute to work around anti-virus software temporarily locking a directory with newly created files. This strategy is also used in vscode, Ref: microsoft/vscode#188899

I am not restricting this fix to only Windows, because similar issues can happen on Linux, Ref: isaacs/node-graceful-fs#133

@nhz2 nhz2 marked this pull request as ready for review August 20, 2024 04:42
@nhz2
Copy link
Contributor Author

nhz2 commented Aug 20, 2024

@xlxs4 if you can still reproduce the problem you were having in #3822, could you give this PR a try to see if it helps at all?

if retry < max_num_retries && err ∈ (Base.UV_EACCES, Base.UV_EPERM, Base.UV_EBUSY)
sleep(sleep_amount)
sleep_amount = min(sleep_amount*2.0, max_sleep_amount)
retry += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this can take up to 60s, I think we should print something out to the user. Something like:

# Warn once, and only if we've slept a non-trivial amount of time already
if retry == 8
    @warn("Artifact installation failed, retrying for up to 60s.  Check to see if an anti-virus may be interfering.")
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is hit during the new parallel artifact download process, then outputting a log will mess with the multiprogress printing (sorry). We'll need some way to keep that tidy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
How about adding other status messages to the right of the progress bar where it currently has "XX MiB/XX MiB" like "unpacking", "verifying tree hash", and "installing"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the location works but in the default happy path I wouldn't show any further detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So only show a status message if an artifact is stuck at a stage for over 1 second?

To do the actual printing, can I add a status::String field to

Base.@kwdef mutable struct MiniProgressBar

that gets appended to progress_text in?

progress_text = if p.mode == :percentage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good but I'd do it such that if status is non empty just show the status string after the name and not the progress bar or ending ratio. Just so that we're more likely to fit in the terminal width.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4005 has landed. Thanks @nhz2. So I believe this is good to go now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to backport this to 1.10 & 1.11, but when we do we'll want to add a log message about it taking a long time, as suggested above, given the parallel download thing isn't going to be backported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to the time it already takes to download and unpack artifacts on Windows with an aggressive anti-virus I don't think one minute extra is something to create a warning about, as long as the artifact eventually gets installed.

@xlxs4
Copy link

xlxs4 commented Aug 21, 2024

@xlxs4 if you can still reproduce the problem you were having in #3822, could you give this PR a try to see if it helps at all?

I haven't been able to repro the issue... will let you know if I do

@IanButterworth
Copy link
Member

As this is currently, the status that will print will explain the step that's taking a long time, but won't mention that it could be the antivirus causing it. Should that be changed? Might be annoying to see the same long explanation multiple times as I assume the issue will affect multiple parallel install tasks?

@nhz2
Copy link
Contributor Author

nhz2 commented Aug 31, 2024

I was able to test this PR using LD_PRELOAD to simulate an unreliable rename. If I make rename only sometimes fail, installing works. If I make rename always error with EBUSY, I get the moving to artifact store message for a minute, then Pkg tries to download from the next URL, and I get moving to artifact store again for a minute. Finally, I get an error:

julia> Pkg.pkg"add x265_jll"
   Resolving package versions...
  Installing artifacts  ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1
ERROR: Failed to install some artifacts:

Unable to automatically download/install artifact 'x265' from sources listed in '/home/nathan/.julia/packages/x265_jll/70izA/Artifacts.toml'.
Sources attempted:
- https://pkg.julialang.org/artifact/00023f2f62a04af7531278ce1a689ff93800ffa1
    Error: IOError: rename of "/home/nathan/.julia/artifacts/jl_Aly3ZD" to "/home/nathan/.julia/artifacts/00023f2f62a04af7531278ce1a689ff93800ffa1": resource busy or locked (EBUSY)
- https://github.com/JuliaBinaryWrappers/x265_jll.jl/releases/download/x265-v3.6.0+0/x265.v3.6.0.x86_64-linux-gnu.tar.gz
    Error: IOError: rename of "/home/nathan/.julia/artifacts/jl_p180ux" to "/home/nathan/.julia/artifacts/00023f2f62a04af7531278ce1a689ff93800ffa1": resource busy or locked (EBUSY)
Stacktrace:
  [1] pkgerror(msg::String)
    @ Pkg.Types ~/github/Pkg.jl/src/Types.jl:68
  [2] download_artifacts(ctx::Pkg.Types.Context; platform::Base.BinaryPlatforms.Platform, julia_version::VersionNumber, verbose::Bool)
    @ Pkg.Operations ~/github/Pkg.jl/src/Operations.jl:978
  [3] download_artifacts
    @ ~/github/Pkg.jl/src/Operations.jl:827 [inlined]
  [4] add(ctx::Pkg.Types.Context, pkgs::Vector{…}, new_git::Set{…}; allow_autoprecomp::Bool, preserve::Pkg.Types.PreserveLevel, platform::Base.BinaryPlatforms.Platform, target::Symbol)
    @ Pkg.Operations ~/github/Pkg.jl/src/Operations.jl:1656
  [5] add
    @ ~/github/Pkg.jl/src/Operations.jl:1621 [inlined]
  [6] add(ctx::Pkg.Types.Context, pkgs::Vector{…}; preserve::Pkg.Types.PreserveLevel, platform::Base.BinaryPlatforms.Platform, target::Symbol, allow_autoprecomp::Bool, kwargs::@Kwargs{…})
    @ Pkg.API ~/github/Pkg.jl/src/API.jl:306
  [7] add(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{IO}, kwargs::@Kwargs{})
    @ Pkg.API ~/github/Pkg.jl/src/API.jl:159
  [8] add(pkgs::Vector{Pkg.Types.PackageSpec})
    @ Pkg.API ~/github/Pkg.jl/src/API.jl:148
  [9] do_cmd(command::Pkg.REPLMode.Command, io::IOContext{IO})
    @ Pkg.REPLMode ~/github/Pkg.jl/src/REPLMode/REPLMode.jl:407
 [10] do_cmds(commands::Vector{Pkg.REPLMode.Command}, io::IOContext{IO})
    @ Pkg.REPLMode ~/github/Pkg.jl/src/REPLMode/REPLMode.jl:393
 [11] do_cmds(input::String, io::IOContext{IO})
    @ Pkg.REPLMode ~/github/Pkg.jl/src/REPLMode/REPLMode.jl:383
 [12] pkgstr(str::String)
    @ Pkg.REPLMode ~/github/Pkg.jl/src/REPLMode/REPLMode.jl:450
 [13] top-level scope
    @ REPL[3]:1
Some type information was truncated. Use `show(err)` to see complete types.

@IanButterworth
Copy link
Member

Great. Seems like a big improvement.

A thought for a follow on PR would be adding something to the docs about why that could happen.

@IanButterworth IanButterworth merged commit bc51db7 into JuliaLang:master Aug 31, 2024
7 of 9 checks passed
@nhz2 nhz2 deleted the graceful-artifact-rename branch August 31, 2024 22:19
@IanButterworth IanButterworth mentioned this pull request Oct 4, 2024
14 tasks
IanButterworth pushed a commit that referenced this pull request Oct 4, 2024
(cherry picked from commit bc51db7)
IanButterworth pushed a commit that referenced this pull request Oct 4, 2024
(cherry picked from commit bc51db7)
@IanButterworth IanButterworth mentioned this pull request Oct 5, 2024
7 tasks
visr pushed a commit to visr/Pkg.jl that referenced this pull request Nov 28, 2024
(cherry picked from commit bc51db7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Artifact folder could not be made on Windows.
5 participants